Autoimmune Hepatitis (AIH) is a chronic inflammatory liver disorder whose accurate diagnosis necessitates the concurrent evaluation of clinical, serological, and histopathological data. The diagnostic process remains inherently complex due to the requirement for expert interpretation of liver biopsy specimens and clinical reports, both time-intensive and prone to inter-observer variability. This paper presents an AI-driven multimodal diagnostic framework that integrates clinical text data with histopathological biopsy images to assist clinicians in the evaluation of AIH. The system is deployed as a web-based application that accepts structured patient records and biopsy images as inputs, processes them through a multimodal large language model (MLLM), and generates two forms of output: a clinician-oriented structured diagnostic support report and a simplified patient-facing explanation. Evaluation on simulated diagnostic scenarios demonstrates an overall diagnostic accuracy of 0.88, precision of 0.91, and recall of 0.85, with average inference time under one minute per case. These results underscore the potential of multimodal AI architectures to enhance diagnostic efficiency and improve patient communication in hepatology.
Introduction
This paper proposes a Multimodal Large Language Model (MLLM)-based diagnostic support system for Autoimmune Hepatitis (AIH), a chronic immune-mediated liver disease that can progress to cirrhosis and liver failure if left untreated. Diagnosing AIH is challenging because it requires integrating clinical history, laboratory findings, autoantibody profiles, immunoglobulin levels, and liver biopsy results. Traditional diagnostic methods depend heavily on expert interpretation and are subject to variability, particularly in assessing liver biopsy features.
The proposed system addresses these challenges by combining clinical records and digitized liver biopsy images within a single AI framework. Unlike conventional AI systems that analyze either medical images or text separately, the multimodal model performs joint reasoning over both data types, improving diagnostic completeness and accuracy. The system also generates patient-friendly explanations alongside clinician-oriented reports to support better communication and shared decision-making.
The literature review highlights previous work using Convolutional Neural Networks (CNNs) for liver biopsy image analysis and Natural Language Processing (NLP) for extracting information from electronic health records. Although these approaches have shown promising results, they operate independently and fail to exploit the complementary information available from multiple data sources. Recent multimodal AI models have demonstrated success in other medical specialties, but their application to hepatology remains limited.
The proposed system consists of a web-based client–server architecture with four main modules: data input, backend processing, multimodal inference, and report generation. Clinical information is submitted through structured forms or PDF reports, while liver biopsy images are uploaded separately. The backend extracts relevant laboratory values, preprocesses biopsy images, and encodes both text and image data into a unified representation. A multimodal large language model then jointly analyzes biochemical markers, autoantibody profiles, and histopathological features to estimate the likelihood of AIH and generate structured diagnostic reports aligned with established AIH diagnostic criteria.
Evaluation using simulated clinical scenarios representing different AIH presentations demonstrated promising performance, achieving 88% diagnostic accuracy, 91% precision, and 85% recallThe system processed each case in under one minute, making it suitable for integration into outpatient clinical workflows. It produced coherent clinician reports, appropriately handled uncertain or conflicting findings, and generated clear, patient-friendly explanations. However, the study acknowledges limitations, including the use of simulated data, a relatively small test dataset, and the lack of real-world clinical validation.
Conclusion
This paper presents an AI-driven multimodal diagnostic support framework for autoimmune hepatitis that integrates clinical text records with histopathological biopsy images through a unified MLLM inference pipeline. The system demonstrates technically viable performance on simulated diagnostic scenarios, achieving 0.88 accuracy and sub-minute inference latency. Its dual-output architecture addresses both clinical decision support and patient health communication requirements.
The principal contribution lies in the application of multimodal AI to a diagnostically challenging and understudied hepatological condition, establishing a methodological foundation for subsequent real-world validation.
References
[1] European Association for the Study of the Liver, \"Clinical practice guidelines: Autoimmune hepatitis,\" J. Hepatology, vol. 63, no. 4, pp. 971–1004, 2015.
[2] M. P. Manns et al., \"Diagnosis and management of autoimmune hepatitis,\" Hepatology, vol. 51, no. 6, pp. 2193– 2213, 2010.
[3] E. M. Hennes et al., \"Simplified criteria for the diagnosis of autoimmune hepatitis,\" Hepatology, vol. 48, no. 1, pp. 169–176, 2008.
[4] M. Moor et al., \"Foundation models for generalist medical artificial intelligence,\" Nature, vol. 616, pp. 259–265, 2023.
[5] G. Litjens et al., \"A survey on deep learning in medical image analysis,\" Med. Image Anal., vol. 42, pp. 60–88, 2017.
[6] D. Spathis and S. Kleftakis, \"Multimodal medical AI,\" Annu. Rev. Biomed. Data Sci., vol. 5, pp. 107–132, 2022.